candidate patch
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper proposes a denoising algorithm based on non-local image statistics and patch repetition by combining the advantages of NL-means and Exponentially Weighted Aggregation (EWA). The computation of the aggregated estimator is done using MCMC and results are comparable to state-of-the-art algorithms. Pluses: 1) the method seems simple and straightforward to implement. Minus: In terms of explaining *why* the method works, the text leaves something to be desired. E.g., in the second paragraph of section 7 (The proposed implementation proceeds in two identical iterations.)
Trae Agent: An LLM-based Agent for Software Engineering with Test-time Scaling
Trae Research Team, null, Gao, Pengfei, Tian, Zhao, Meng, Xiangxin, Wang, Xinchen, Hu, Ruida, Xiao, Yuanan, Liu, Yizhou, Zhang, Zhao, Chen, Junjie, Gao, Cuiyun, Lin, Yun, Xiong, Yingfei, Peng, Chao, Liu, Xia
Software issue resolution is a critical challenge in software engineering and has garnered increasing attention in recent years. With the rapid advancement of large language models (LLMs), substantial progress has been made in addressing real-world software engineering tasks. Recent studies have introduced ensemble reasoning techniques to enhance the performance of LLM-based issue resolution. However, existing prompting-based methods still face limitations in effectively exploring large ensemble spaces and lack the capacity for repository-level understanding, both of which constrain their overall effectiveness. In this paper, we propose Trae Agent, the first agent-based ensemble reasoning approach for repository-level issue resolution. Trae Agent formulates our goal as an optimal solution search problem and addresses two key challenges, i.e., large ensemble spaces and repository-level understanding, through modular agents for generation, pruning, and selection. We conduct extensive experiments using three leading LLMs on the widely-adopted SWE-bench benchmark, comparing Trae Agent against four state-of-the-art ensemble reasoning techniques. Experimental results demonstrate that Trae Agent consistently achieves superior performance, with an average improvement of 10.22% over all baselines in terms of Pass@1. Trae Agent has achieved first place on the SWE-bench Verified leaderboard, with a notable Pass@1 score of 75.20%. We are pleased to release Trae Agent as an open-source project to support the research community, with all resources available at https://github.com/bytedance/trae-agent.
- North America > United States > New York > Rockland County > Pearl River (0.04)
- North America > Canada > British Columbia > Vancouver (0.04)
- Asia > China > Beijing > Beijing (0.04)
SemAgent: A Semantics Aware Program Repair Agent
Pabba, Anvith, Mathai, Alex, Chakraborty, Anindya, Ray, Baishakhi
Large Language Models (LLMs) have shown impressive capabilities in downstream software engineering tasks such as Automated Program Repair (APR). In particular, there has been a lot of research on repository-level issue-resolution benchmarks such as SWE-Bench. Although there has been significant progress on this topic, we notice that in the process of solving such issues, existing agentic systems tend to hyper-localize on immediately suspicious lines of code and fix them in isolation, without a deeper understanding of the issue semantics, code semantics, or execution semantics. Consequently, many existing systems generate patches that overfit to the user issue, even when a more general fix is preferable. To address this limitation, we introduce SemAgent, a novel workflow-based procedure that leverages issue, code, and execution semantics to generate patches that are complete - identifying and fixing all lines relevant to the issue. We achieve this through a novel pipeline that (a) leverages execution semantics to retrieve relevant context, (b) comprehends issue-semantics via generalized abstraction, (c) isolates code-semantics within the context of this abstraction, and (d) leverages this understanding in a two-stage architecture: a repair stage that proposes fine-grained fixes, followed by a reviewer stage that filters relevant fixes based on the inferred issue-semantics. Our evaluations show that our methodology achieves a solve rate of 44.66% on the SWEBench-Lite benchmark beating all other workflow-based approaches, and an absolute improvement of 7.66% compared to our baseline, which lacks such deep semantic understanding. We note that our approach performs particularly well on issues requiring multi-line reasoning (and editing) and edge-case handling, suggesting that incorporating issue and code semantics into APR pipelines can lead to robust and semantically consistent repairs.
Using ML filters to help automated vulnerability repairs: when it helps and when it doesn't
Camporese, Maria, Massacci, Fabio
Using ML filters to help automated vulnerability repairs: when it helps and when it doesn't Authors: Maria Camporese, University of Trento (Italy) Fabio Massacci, University of Trento (Italy), Vrije Universiteit Amsterdam (The Netherlands)This work has been partly supported by the European Union (EU) under Horizon Europe grant n . This paper reflects only the author's view and the funders are not responsible for any use that may be made of the information contained therein. As artificial intelligence (AI) becomes omnipresent, even integrated within secure software development, the safety of digital infrastructures requires new technologies and new methodologies, as highlighted in the EU Strategic Plan 2021-2024. To achieve this goal, the EU-funded Sec4AI4Sec project will develop advanced security-by-design testing and assurance techniques tailored for AI-augmented systems. These systems can democratise security expertise, enabling intelligent, automated secure coding and testing while simultaneously lowering development costs and improving software quality. However, they also introduce unique security challenges, particularly concerning fairness and explainability. Sec4AI4Sec is at the forefront of the move to tackle these challenges with a comprehensive approach, embodying the vision of better security for AI and better AI for security. Hybrid Explainable Workflows for Security and Threat Intelligence (HEWSTI) In research into threats to safety and security, people and AI collaborate to obtain actionable intelligence.
- Europe > Italy > Trentino-Alto Adige/Südtirol > Trentino Province > Trento (0.45)
- Europe > Netherlands > North Holland > Amsterdam (0.25)
- Information Technology > Software Engineering (1.00)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents
Zhang, Kexun, Yao, Weiran, Liu, Zuxin, Feng, Yihao, Liu, Zhiwei, Murthy, Rithesh, Lan, Tian, Li, Lei, Lou, Renze, Xu, Jiacheng, Pang, Bo, Zhou, Yingbo, Heinecke, Shelby, Savarese, Silvio, Wang, Huan, Xiong, Caiming
Large language model (LLM) agents have shown great potential in solving realworld software engineering (SWE) problems. The most advanced open-source SWE agent can resolve over 27% of real GitHub issues in SWE-Bench Lite. To fully harness the diversity of these agents, we propose DEI (Diversity Empowered Intelligence), a framework that leverages their unique expertise. DEI functions as a meta-module atop existing SWE agent frameworks, managing agent collectives for enhanced problemsolving. Experimental results show that a DEI-guided committee of agents is able to surpass the best individual agent's performance by a large margin. For instance, a group of open-source SWE agents, with a maximum individual resolve rate of 27.3% on SWE-Bench Lite, can achieve a 34.3% resolve rate with DEI, making a 25% improvement and beating most closed-source solutions. Our findings contribute to the growing body of research on collaborative AI systems and their potential to solve complex software engineering challenges. Recent advancements in large language models (LLMs) have transformed software engineering (SWE) and other domains.
A Novel Approach for Automatic Program Repair using Round-Trip Translation with Large Language Models
Ruiz, Fernando Vallecillos, Grishina, Anastasiia, Hort, Max, Moonen, Leon
Research shows that grammatical mistakes in a sentence can be corrected by translating it to another language and back using neural machine translation with language models. We investigate whether this correction capability of Large Language Models (LLMs) extends to Automatic Program Repair (APR). Current generative models for APR are pre-trained on source code and fine-tuned for repair. This paper proposes bypassing the fine-tuning step and using Round-Trip Translation (RTT): translation of code from one programming language to another programming or natural language, and back. We hypothesize that RTT with LLMs restores the most commonly seen patterns in code during pre-training, i.e., performs a regression toward the mean, which removes bugs as they are a form of noise w.r.t. the more frequent, natural, bug-free code in the training data. To test this hypothesis, we employ eight recent LLMs pre-trained on code, including the latest GPT versions, and four common program repair benchmarks in Java. We find that RTT with English as an intermediate language repaired 101 of 164 bugs with GPT-4 on the HumanEval-Java dataset. Moreover, 46 of these are unique bugs that are not repaired by other LLMs fine-tuned for APR. Our findings highlight the viability of round-trip translation with LLMs as a technique for automated program repair and its potential for research in software engineering. Keywords: automated program repair, large language model, machine translation
- Europe > Norway > Eastern Norway > Oslo (0.04)
- North America > United States > Virginia (0.04)
Conversational Automated Program Repair
Xia, Chunqiu Steven, Zhang, Lingming
Automated Program Repair (APR) can help developers automatically generate patches for bugs. Due to the impressive performance obtained using Large Pre-Trained Language Models (LLMs) on many code related tasks, researchers have started to directly use LLMs for APR. However, prior approaches simply repeatedly sample the LLM given the same constructed input/prompt created from the original buggy code, which not only leads to generating the same incorrect patches repeatedly but also miss the critical information in testcases. To address these limitations, we propose conversational APR, a new paradigm for program repair that alternates between patch generation and validation in a conversational manner. In conversational APR, we iteratively build the input to the model by combining previously generated patches with validation feedback. As such, we leverage the long-term context window of LLMs to not only avoid generating previously incorrect patches but also incorporate validation feedback to help the model understand the semantic meaning of the program under test. We evaluate 10 different LLM including the newly developed ChatGPT model to demonstrate the improvement of conversational APR over the prior LLM for APR approach. Bugs in software can cause significant financial losses Matteson (2018) and create dangerous health and safety problems Hanbury (2019).
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Illinois (0.04)
CURE: Code-Aware Neural Machine Translation for Automatic Program Repair
Jiang, Nan, Lutellier, Thibaud, Tan, Lin
Automatic program repair (APR) is crucial to improve software reliability. Recently, neural machine translation (NMT) techniques have been used to fix software bugs automatically. While promising, these approaches have two major limitations. Their search space often does not contain the correct fix, and their search strategy ignores software knowledge such as strict code syntax. Due to these limitations, existing NMT-based techniques underperform the best template-based approaches. We propose CURE, a new NMT-based APR technique with three major novelties. First, CURE pre-trains a programming language (PL) model on a large software codebase to learn developer-like source code before the APR task. Second, CURE designs a new code-aware search strategy that finds more correct fixes by focusing on compilable patches and patches that are close in length to the buggy code. Finally, CURE uses a subword tokenization technique to generate a smaller search space that contains more correct fixes. Our evaluation on two widely-used benchmarks shows that CURE correctly fixes 57 Defects4J bugs and 26 QuixBugs bugs, outperforming all existing APR techniques on both benchmarks.
- North America > United States > New York > New York County > New York City (0.04)
- North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
Stochastic Patching Process
Fan, Xuhui, Li, Bin, Wang, Yi, Wang, Yang, Chen, Fang
Stochastic partition models tailor a product space into a number of rectangular regions such that the data within each region exhibit certain types of homogeneity. Due to constraints of partition strategy, existing models may cause unnecessary dissections in sparse regions when fitting data in dense regions. To alleviate this limitation, we propose a parsimonious partition model, named Stochastic Patching Process (SPP), to deal with multi-dimensional arrays. SPP adopts an "enclosing" strategy to attach rectangular patches to dense regions. SPP is self-consistent such that it can be extended to infinite arrays. We apply SPP to relational modeling and the experimental results validate its merit compared to the state-of-the-arts.
- Asia > Middle East > Jordan (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- North America > United States > California (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Communications > Social Media (0.94)
- Information Technology > Data Science (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)